Two reasons that we’ll be most interested in (to start):
It’s a great way to safely store organized versions of a project
It’s ideal for data science collaborators who need to share & edit code together
Watch this video before or after today’s session
Repositories (repos) on GitHub are the same unit as an RStudio Project - it’s a place where you can easily store all information/data/etc. related to whatever project you’re working on.
When we create a Repository in GitHub and have it communicating with a Project in RStudio, then we can get (pull) information from GitHub to RStudio, or push information from RStudio to GitHub where it is safely stored and/or our collaborators can access it. It also keeps a complete history of updated versions that can be accessed/reviewed by you and your collaborators at any time, from anywhere, as long as you have the internet.
A couple of general tips:
Pull frequently (if working with anyone else, and when you start working on a project again, even if on the same device)
Commit/push in small, meaningful increments and with useful (searchable, descriptive) Commit messages
The best way to deal with merge conflicts is to avoid creating merge conflicts. This is what happens when two people pull a version, work on it separately and then try to push back to the repo at the same time Communicate with collaborators so you’re not all working on the same piece of code at the same time.
PART 1. Fork & Clone an existing repo
PART 2. Create your own repo and version controlled R project from scratch
PART 3. Use the GitHub Classroom to fork a project, organise folders, run a script
END Make sure you complete this worksheet it will be vital for submitting your second summative assignment.
Become a master of reproducible research & version control with GitHub
Improve project structure
Write informative notes for collaborators & archiving
Go to github.com and log in (you need your own account - for sign up with your uea.ac.uk e-mail)
In the Search bar, look for repo Philip-Leftwich/5023Y-Happy-Git
Click on the repo name, and look at the existing repo structure
FORK the repo
Press Clone/download, copy the URL, and create a new project from Git repository in RStudio (add your URL) (Note you may be asked to enable Git and/or asked to provide your GitHub username and password)
Open the some_cool_animals.Rmd document, and the accompanying html
Add your name to the top of the document
BUT WAIT. We have forgotten to add a great image and facts about a very important species - Baby Yoda, including an image
Also known as “The Child”
likes unfertilised frog eggs & control knobs
strong with the force
Staged - pick those files which you intend to bind to a commit
Commit - write a short descriptive message, binds changes to a single commit
Push - “Pushes” your changes from the local repo to the remote repo on GitHub.
Today we will have a bit of a play with tidyverse tools and plotting to refresh your memory!
Remember if your code doesn’t run, it is usually a simple fix. Read the error message carefully and see how many of these bingo tiles you can pick up!
Go back to your GitHub account
Click on the plus sign (upper right, by your profile pic/icon) to create a new repository
Name the repo 5023Y-second-repo-yourinitials (like 5023Y-second-repo-PL), and select to initialize with a ReadMe
Edit the ReadMe (however you want - notice that markdown formatting works & you can see a preview) & commit
Some tips:
Include a title, introduction/objectives
Clone to create a connected R Project in RStudio
Create a new R Markdown document
Attach the {tidyverse},{palmerpenguins} and {plotly} packages in a hidden code chunk (include = FALSE)
Create an interactive plot of PalmerPenguins with {plotly}, showing the output but not the code or messages
penguin_graph <- ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(size = body_mass_g,
color = species),
alpha = 0.4) +
scale_color_manual(values = c("purple","orange","black")) +
theme_minimal() +
labs(x = "bill length (mm)",
y = "bill depth mm",
title = "Penguin measurements")
ggplotly(penguin_graph, tooltip = c("species"))BLURB ON CLASSROOM
Follow this invite link
You will be invited to sign-in to Github (if not already) & to join the UEABIO organisation
Clone your assignment to work locally in RStudio
In your local project folder, create subfolders ‘data’ and ‘final_graphs’ (Note use the dir.create commands in the console)
Drop the file disease_burden.csv into the ‘data’ subfolder (Note unlike in Bash there is no base R command to move files so do this using the RStudio pane)
Open a new .R script
Attach the {tidyverse} and {janitor} package
Read in the disease_burden.csv data
This is a file look at the death rate for every country in the world across four decades.
Use %>% and filter to pull out “United States”, “Japan”, “Afghanistan”& “Somalia”
We want to look at death for both sexes at 0-6 days so use filter here as well.
Assign this to a new object
library(tidyverse)
library(janitor)
db <- read_csv("data/disease_burden.csv") %>%
clean_names() %>%
rename(deaths_per_100k = death_rate_per_100_000)
# View(db)
# Subset (US, Japan = lowest infant death rates, Afghanistan = highest infant death rates)
db_sub <- db %>%
filter(country_name %in% c("United States", "Japan", "Afghanistan", "Somalia")) %>%
filter(age_group == "0-6 days", sex == "Both")HINT - use geom_line() and remember to separate countries by colour
ggplot(data = db_sub) +
geom_line(aes(x = year,
y = deaths_per_100k,
color = country_name)) +
scale_color_manual(values = c("black", "blue", "magenta", "orange"))annotate) and vertical or horizontal lines with geom_vline or geom_hline# Graph
# New things: annotation + vertical line
ggplot(data = db_sub) +
geom_line(aes(x = year,
y = deaths_per_100k,
color = country_name)) +
scale_color_manual(values = c("black", "blue", "magenta", "orange")) +
annotate(geom = "text",
x = 1985,
y = 2.2e5,
label = "Afghanistan",
size = 2.5) +
geom_vline(xintercept = 2000,
lty = 2) +
theme_minimal()Save, stage, commit & push
Check that changes are stored on GitHub
(NOTE this will be in your organisations rather than repos)
Make sure you finish all three exercises before next week to become a GitHub pro!!!!!!!!!
** Want to learn about all things GitHub and R? (https://happygitwithr.com/)